VIROLOGY 195, 680-691 (1993) 


Nucleotide Sequence of the Human Coronavirus 229E RNA Polymerase Locus 


J. HEROLD,' T. RAABE,? B. SCHELLE-PRINZ, AND S. G. SIDDELL 


Institute of Virology, University of Wérzburg, Versbacher Str. 7, 8700 Wurzburg, Germany 


Received February 11, 1993; accepted April 12, 1993 


The nucleotide sequence of the human coronavirus 229E (HCV 229E) RNA polymerase gene and the 5’ region of the 
genome has been determined. The polymerase gene is comprised of two large open reading frames, ORFia and 
ORF ib, that contain 4086 and 2687 codons, respectively. ORF1b overlaps ORF1a by 43 bases in the (—1) reading 
frame. The in vitro translation of SP6 transcripts which include HCV 229E sequences encompassing the ORF 1a/ORF1b 
junction show that expression of ORFib can be mediated by ribosomal frame-shifting. The predicted translation 
products of ORF 1a (454,200 molecular weight) and ORF ta/1b (754,200 molecular weight) have been compared to the 
predicted RNA polymerase gene products of infectious bronchitis virus (IBV) and murine hepatitis virus (MHV) and 
conserved structural features and putative functional domains have been identified. This analysis completes the nu- 


cleotide sequence of the HCV 229E genome. 


INTRODUCTION 


The human coronaviruses (HCV) are the cause of 
upper respiratory illness and it has been estimated that 
up to 20% of common colds are caused by HCV (Mcln- 
tosh et a/., 1974; Isaacs ef a/., 1983; Macnaughton et 
al, 1983). Although the symptoms of HCV-related 
colds are generally mild and the duration of illness is 
short, the economic consequences of HCV infection 
are significant (Hierholzer and Tannock, 1988). Also, 
the possible association of HCV infection with more 
severe respiratory tract illness in children (Matsumoto 
and Kawano, 1992) or as a precipitant of asthmatic 
exacerbations (Pattemore ef a/., 1992) needs to be fur- 
ther investigated. 

It has been established that there are two major anti- 
genic groups of HCV, represented by the prototypes 
HCV 229E and HCV OC43. The major structural com- 
ponents of HCV 229E and HCV O0C43 virions have 
been identified and there is some limited information 
on the synthesis of viral RNA and proteins in the in- 
fected cell (Schmidt and Kenny, 1982; Schmidt, 1984; 
Kemp et a/., 1984; Hogue and Brian, 1986; Schreiber 
et al, 1989; Raabe ef a/., 1990; Arpin and Talbot, 
1990). 

The HCV 229E genome is a positive-strand RNA 
with an estimated size of 6 X 108 (Macnaughton and 
Madge, 1978). To date, the nucleotide sequence of 
approximately 7 kilobases extending from the 3’ end of 
the genome has been determined. This region en- 
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codes the nucleocapsid protein, N (Schreiber et a/., 
1989; Myint et a/., 1990), the membrane glycoprotein, 
M (Raabe and Siddell, 1989b; Jouvenne et a/., 1990) 
and the surface glycoprotein, S (Raabe et a/., 1990). 
Additionally, there are three small open reading frames 
(ORFs), ORF4a, ORF4b, and ORFS, located between 
the S and the M protein genes (Raabe and Siddell, 
1989a; Jouvenne et a/., 1992). It seems likely that the 
putative HCV 229E ORF5 gene productis a virion struc- 
tural protein (Liu and Inglis 1991; Godet et a/., 1992) 
but the function of the putative ORF4a and ORF4b 
gene products is unknown. 

In coronavirus-infected cells, the viral genes are ex- 
pressed from the genomic and subgenomic mRNAs. 
The subgenomic mRNAs form a 3’ coterminal set and 
are synthesized by a process of discontinuous tran- 
scription (for a recent review see Lai, 1990). In the case 
of HCV 229, seven positive-strand RNA species (num- 
bered 1 to 7 in order of decreasing size) have been 
identified in the infected cell. The translation products 
of the S, ORF4a and 4b, ORF5, M, and N protein genes 
have been provisionally assigned to RNA 2, 4, 5, 6, and 
7, respectively (Raabe ef a/., 1990; Schreiber et a/., 
1989), although messenger RNA function has been 
confirmed only for RNA 7 (Myint et a/., 1990). It is not 
yet clear whether RNA 3 should be considered as a 
putative mRNA (Raabe et a/., 1990). 

The remainder of the HCV 229E genome encom- 
passes the unique region of RNA 1, i.e., the genomic 
RNA. This region, which is referred to as gene 1, the 
RNA polymerase gene or the RNA polymerase locus, 
has been entirely sequenced for IBV and MHV (Bours- 
nell et a/., 1987; Bredenbeek et a/., 1990; Lee et a/., 
1991) and is comprised of two large ORFs, ORF ta and 
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Fig. 1. Organization of the HCV 229E genome and the position of cDNA clones encompassing gene 1. The major ORFs are represented as 
boxes in the 0, —1, and —2 reading frames. The known and putative structural genes (S, 5, M, and N) are shaded and gene 1 (Pol1a and Pol1b)is 
cross-hatched. The size and position of the cDNA clones and PCR amplification products used to determine the HCV 229E gene 1 sequence are 
shown. The relationship of the intracellular poly(A) RNA species to the genome is also illustrated. 


ORF tb, which overlap by 40-80 bases. The upstream 
ORF 1a potentially encodes a polypeptide of 450,000 
to 600,000 molecular weight. The downstream ORF 1b 
potentially encodes a polypeptide of 300,000 molecu- 
lar weight. The downstream ORF 1b is expressed, how- 
ever, as a fusion protein together with the ORF 1a gene 
product by a mechanism involving (—1) ribosome slip- 
page (Brierley et a/., 1987). This ribosomal frameshift is 
mediated by a ‘‘slippery sequence”’ and pseudoknot 
structure located in a region of the genome encom- 
passing the overlap of ORF1a and ORF 1b (Brierley et 
al., 1989, 1991, 1992; Bredenbeek et a/., 1990; Lee er 
al., 1991). 

{In the case of IBV and MHV, the ORF 1b regions are 
relatively conserved whereas the ORF 1a regions have 
diverged, in particular toward their 5’ ends. It is evident 
that these two large ORFs must encode a number of 
different functions. First, there are functions related to 
RNA replication. Complementation analysis of MHV ts 
mutants with a RNA minus phenotype has shown that 
there are at least five distinct viral functions related to 
RNA synthesis (Leibowitz et a/., 1982; Schaad et a/., 
1990). Analysis of these mutants by genetic recombi- 
nation allows the different functions to be located and 
ordered within the gene 1 locus (Keck ef a/., 1987; 
Baric et a/., 1990). Also, both IBV and MHV contain in 
their ORF1b sequence motifs characteristic of RNA 
polymerases, helicase and metal binding proteins 
(Gorbalenya et a/., 1989; Bredenbeek et a/., 1990). 


Second, there is genetic and biochemical evidence 
that the MHV gene 1 contains viral encoded proteases. 
The complementation frequencies of MHV ts mutants 
are indicative of intergenic, rather than intragenic com- 
plementation (Leibowitz et af, 1982) and an autopro- 
teolytic activity has been mapped to the middle of the 
MHV ORF 1a (Baker et a/., 1989). Motifs characteristic 
of both papain-like and picornavirus 3C-like cysteine 
proteases have also been identified in ORF1a of MHV 
and IBV (Gorbalenya et a/., 1989: Lee et a/., 1991). 

Finally, the large size of the coronavirus gene 1 re- 
gion (approximately 20 kilobases) suggests that it may 
encode many, as yet unidentified, functions. One obvi- 
ous candidate would be a methyitransferase activity 
necessary for the generation of capped viral RNA in the 
cytoplasm of infected cells. Other functions may be 
related to the conserved “membrane protein,” ‘cys- 
teine-rich,”’ and ‘'X'' domains which have been identi- 
fied in gene 1 of IBV and MHV (Gorbalenya et a/., 1989; 
Lee et a/., 1991). 

In this paper we report the nucleotide sequence of 
the human coronavirus 229E gene 1 and the 5’ region 
of the genome. This analysis completes the nucleotide 
sequence of HCV 229E. Furthermore, we provide evi- 
dence that, in common with IBV and MHV, HCV 229E 
ORF 1b expression is mediated by (~1) ribosomal 
frame-shifting. The identification of structural and pu- 
tative functional motifs in the predicted HCV gene 1 
product and a comparison of their organization in the 
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Fig. 2. Consensus sequence of cDNA clones representing the 5’ 
region of the HCV 229E genome. (A) The consensus sequence. The 
intergenic motif, UCUCAAC, is underlined. ORF 1a is initiated with 
AUG at position 293 and the conserved 5’ minicistron”’ is indicated. 
(B) Sequence analysis of the 5’ RACE products. The sequence of the 
A-tailed and T-talled 5’ RACE products derived from the HCV leader 
RNA were determined as described in Methods. The 5 terminal A 
nucleotide is indicated. 


gene 1 proteins of HCV 229E, IBV, and MHV is also 
presented. 


MATERIALS AND METHODS 


Virus and cells 


The HCV 229E isolate used in these studies, the 
methods of virus propagation in C16 cells, and the iso- 
lation of cytoplasmic, poly(A)-containing RNA from 
HCV 229E-infected cells have been described (Raabe 
et a/., 1990). 


cDNA cloning 


cDNA synthesis was done by the method of Gubler 
and Hoffman (1983) using random hexanucleotides or 
the HCV 229E S gene specific oligonucleotide 1 
(Raabe et a/., 1990) as reverse transcription primers. 
The synthesized double-stranded cDNA was size-frac- 
tionated on a Sephacryl S-1000 column, cloned into 
pBluescript || KSt and transformed into competent 
Escherichia coli TG-1 cells. Recombinant clones were 
screened by colony hybridization with HCV 229E-spe- 
cific, 92P-labeled oligonucleotides or HCV 229E-spe- 


cific cDNAs. Standard recombinant DNA procedures 
were done as described by Sambrook et a/. (1989) and 
colony hybridizations were done as described by 
Woods (1984). 


PCR 


PCR was done using a GeneAmp/RNA PCR Kit ac- 
cording to the manufacturers procedures (Perkin- 
Elmer Cetus, Uberlingen, Germany). The biotinylated 
oligonucleotide 2 was used as upstream primer and 
oligonucleotide 3 was used as downstream and re- 
verse transcription primer. The resulting cDNA strands 
were separated using streptavidin-coupled magnetic 
beads, according to the manufacturers protocol 
(Dynal, Hamburg, Germany) and the nucleotide se- 
quence of both strands was determined. 


IBV ORFla 


MHV ORFla 


HCV ORFla 


Fia. 3. Dot matrix comparisons of the predicted amino acid se- 
quences of the ORF 1a proteins of HCV 229E, IBV and MHV. Compar- 
isons of the HCV and IBV proteins (upper panel) and the HCV and 
MHV proteins (lower panel) were generated using the GCG program 
COMPARE {window, 100; stringency, 30; default comparison table) 
and displayed with the program DOTPLOT. 
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HEV (1041-1234) LNGLKILKOL DNNCHVNSVM LOIGLTG... ILDG.DYAMQ FFKNGRVAKM 
HCV {1688-1886) VNGIRVLKTS DNNCHVNAVC IALOYSKPAF ISQGLDAAWN KFVLGDVEIF 
MHV (1720-1919) .GNYFAFKQS NNNCYINVAC LMLOHLSLKF PKWQWRRPGN EFRSGKPLRF 
IBV (1260-1461) RONFLILEWR DGNCWISSAI VLLOAAKIRF .KGFLTEAWA KLLGGDPTOF 
MAV (1125-1315) .CGFYSPAIE RINCWLRSTL IVMOSLPLEF KDLGMQKLWL SYKAGYDOCE 
® 

sl 100 

HCV IERCYTAE.. QCIRGAMGDV GLCMYRLLKD LHTGFMVMD. .YKCSCTS.. 

HCV VAFVYYVA,. RLMKGDKGDA EDTLTKLSKY LANEAQVOLE HYS.SCVECD 

MAY VSLVLAKG.. SFKFNEPSDS TDFIRVELR. .EADLSGATC DLEFIC.KCG 

IBV VAWCYASC.. TAKVGDFSDA NWLLANLAEH FDADYTNAFL KKRVSC.NCG 

MHV VDKLVKSAPK SITLPQGGYV ADFAYFFLS. ..,.QCSFKV HANWRCLKCG 


101 150 
HCV ....GRLEES GAVLFCTPTK K.AFPYGTCL .......NCN APRMCTIROL 
HCV AKFKNSVASI NSAIVCASVK RDGVOVGYCY ., +HGI KYYS.RVRSV 
MEY VKQEQRKGYD A.VMHFGTLD KSGLVKGYNI ACTCG....0 KLVECTOFNY 
IBV IKSYELRGLE ACIQPVRATN LLHFKTOYSN CPTCGANNTD EVI...EASL 
MHV MELK.LOGLD AV.....666 cee FFYGDVV SHMCKCGNSM TLL. ..SADI 


182 200 

HCV QGTIIFVOOK PEPVNP,.VS FVVKPVCSSI FRGAVSCGHY QTNI..YSON 

HCV RGRAIIVSV, .EQLEPCAQS RLLSGVAYTA FSGPVDKGHY TVYD. .TAKK 

MHV PFLICSNIPE GKKLPD...D VVAA,...NI PIGGS.VGH. YTHVKCKPKY 

IBV PYLL.LFATD GPATVDCDED AVGT....VV FVGSTNSGRC YT....QAAG 

MHV PYTFDFGVRD DKF....CAF YTPRKVFRAA CAVDVNDCHS MA... .VVDG 
* 


201 220 
HCV LCVDGFGVNK IQPWINDALN 
HCV SMYDGDRFVK HD....., Ls 
MHV QLYDACNVSK VSEAKGNFTD 
IBV QAFDNLAKDR KFGKKSP... 
MHY KQIDGKVVTK FNGDKFDFMV 


c 1 50 
HCV {3933-4069) .QAGKOTEFV SNSHLLTHCS FAVDPAAAYL DAVKQGAKPV GNCVKMLTNG 
MHV (4235-4475) .QAGTATEYA SNSAILSLCA PSVDPKKTYL DYIQQGGVPV TNCVKMLCDH 
IBV (3783-3929) QSKGHETEEV DAVGILSLCS FAVDPADTYC KYVAAGNOPL GNCVKMLTVH 


S1 100 
HCV SGSGOAITCT IDSNTTQDTY GGASVCIYCR AHVAHP TMDGFCOYKG 
MHV AGTGMAITIK PEATTNQDSY GGASVCIYCR SRVEHP DYDGLCKLRG 
IBV NGSGFAITSK PSPTPDQDSY GGASVCLYCR AHIARPY NLDGRCQFKG 

101 a? 


HCV KWVQVPIGTN DPIRSCLENT VCKVCGCWLN HGCTCDRTAI QS..... 
MHY KFVQVPLGIK DPVSYVLTHD VCQVCGFHRD GSCSCVGTGS QFOS.., 


IBY SFVQIPTTEK DPVGFCLRNK VCTVCOCWIG YGCOCDSLRQ PKSSVOS 
* so Paty 
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B 1 50 


MHV (3350-3654) QSGIVKMVSP TSKVEPCVVS VIYGNMTLNG LWLDDKVYCP REVICSSADM 

HCV (2965-3268) QAGLRKMAQP SGFVEKCVVR VCYGNTVLNG LWLGDIVYCP REVIASNTT. 

IBV (2779-3088) QSGFKKLVSP SSAVEKCIVS VSYRGNNLNG LWLGDTIYCP REVI. ..GKF 
* 


51 100 
MHV TDPDYPNLLC RVTSSDFCVM SDR.MSLTVM SYQMOGSLLV LIVTLONPNT 
HCV SAIDYDREYS IMRLHNFSII SGT.AFLGVV GATMHGVTLK IKVSQTNMHT 
IBV SGDQWNDVLN LANNHEFEVT TQHGVTLNVY SRRLKGAVLI LOTAVANAET 


101 150 
MHV  PKYSFGVVKP GETFTVLAAY NGRPOGAFHV VMRSSHTIKG SFLCGSCGSV 
HCV PRHSFRTLKS GEGFNILACY DGCAQGVFGV NMRTNGTIRG SFINGACGS? 
IBV PKYKFIKANC GDSFTIACAY GGTVVGLYPV TMRSNGTIRA SFLAGACGSV 
* 
181 200 
MEV GYVLTGDSVR FVYMRQLELS TGCHTGTDFS GNFXGPYRDA QVVQLPVODY 
BCV GYHLKNGEVE FVYMEQIELG SGSHVGSSFD GYMYGGFEDQ PRLOVESANG 
IBV GFNIEKGVVN FFYMHHLELP NALHTGTDLM GEFYGGYVDE EVAQRVPPON 


201 250 
MBV TOTVNVVARL YAATLNRCNW FVQS DSCSLEEFNV WAMTNGFSSI 
HCV MLTVNVVAFL YAAILNGCTW WLKG EKLFVEHYNE WAQANGFTAM 
IBV LVTNNIVAWL YAAIISVKES SFSLPKWLES TTVSVDDYNK WAGDNGFTPF 


252 300 
MHV KADLVLDALA SMTGVTVEQV LAAIKRLHSG FQGKQILGSC VLEDELTPSD 
HCV NGEDAFSILA AKTGVCVERL LHAIQVLNNG FGGKQILGYS SLNDEFSINE 
IBV STSTAITKLS AITGVDVCKL LRTIMVKNSQ WGGDPILGQY NFEDELTPES 


301 312 
MEV VYOOLAGVKL QS 
HCV VVKQMFGVNL OS 
IBV VENQIGGVRL QS 


Fig. 4. Putative functional domains of the HCV 229E ORF 1a translation product. The amino acid sequences of ORF 1a of HCV 229E, MHV, and 
IBV were aligned using the UWGCG program PileUp {default settings) and the structgappep.cmp (A) or pam250.cmp (B and C) comparison 
tables. (A) The papain-like protease motifs; (B) the 3C-like protease motif: (C) The growth factor/receptor-like motif. In (A) and (B) the catalytic 
residues proposed by Lee et a/. (1991) are shown in bald type and marked with an asterisk. In (C) the putative disulphide bond residues proposed 
by Gorbalenya et a/. (1989) are similarly highlighted. The numbering of the aligned sequences is for reference only. 


DNA sequencing 


Sequencing was done on single-strand and double- 
strand templates using the chain termination method 
and M13, T7, T3, and HCV 229E-specific sequencing 
primers. To generate cDNA sequencing templates, 
overlapping deletions were introduced by unidirec- 
tional exonuclease ill digestion (Henikoff, 1984). Both 
strands of all cDNAs and the PCR product were se- 
quenced. Sequence data was assembled by the pro- 
gram of Staden (1982) and analysed by the programs 
of the Genetics Computer Group, Inc. (Devereux et a/., 
1984). 


5’ RACE 


Sequences at the 5’ end of the HCV 229E leader 
RNA were determined by a “rapid amplification of 
cDNA ends” method (Frohmann et a/., 1988). A °#P-la- 
beled oligonucleotide, 4, complementary to a region of 
the HCV 229E leader RNA (Schreiber et a/., 1989), was 
used as primer for the reverse transcription of cyto- 
plasmic, poty(A)-containing RNA from HCV-infected 
cells. Reverse transcription was done with Superscript 


RNase H7 reverse transcriptase (Gibco, Eggenstein, 
Germany) using the manufacturers protocol. The larg- 
est product was purified by gel electrophoresis and 
tailed with dATP or dTTP using terminal transferase. 
The tailed cDNAs were then amplified in separate 3 
primer PCRs (A-tailed product: Oligonucleotides 4 and 
5 and biotinylated oligonucleotide 2; T-tailed product: 
Oligonucleotides 4 and 6 and biotinylated oligonucleo- 
tide 2). The amplifications were done using AmpliTaq 
DNA polymerase (Perkin-Elmer Cetus) by heating the 
reaction to 94°, followed by 3 cycles of denaturation 
(94°, 1 min), annealing (45°, 1 min), and extension 
(72°, 5 sec), 30 cycles of denaturation (94°, 1 min), 
annealing (51°, 1 min) and extension (72°, 5 sec) anda 
final extension step of 72° for 10 min. The cONA 
strands were separated using streptavidin coupled 
magnetic beads and the biotinylated strand was se- 
quenced using primer 4. 


Oligonucleotides 


Oligonucleotides were synthesized using phos- 
phoamidite chemistry on a Cyclone DNA synthesizer 
and purified by gel electrophoresis. The 5’ biotinylated 
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Fig. 5. Analysis of HCV RNA-mediated ribosomal frame-shifting. (A) The consensus sequence of cDNA clones in the region of the ORF 1a/ 
ORF 1b overlap. The ends of the ORF1a and ORF 1b sequences are indicated. The putative slippage site, TTTAAAC, is shown in bold type and 
the complementary sequences which we propose to form the Si and S2 stems are overlined. (B) A proposed model of the HCV 229E 
pseudoknot structure at the ORF1a-ORF 1b junction. (C) The structure of plasmids pFS and pAFS. The DNA structure of pFS and paFS is 
schematically shown together with the position of the HCV 229E ORF 1a/ORF 1b overlap. The size of the SP6 run off transcription products and 
the translation products predicted in the event of ORF 1a termination or (— 1) ribosomal frame-shifting are shown. (D) /n vitro translation products 
of pFS and pAFS mRNA. Lane M, molecular weight markers (CFA626, Amersham Buchler, Braunschweig, Germany); lane 1, no RNA; lane 2, 


pAFS/BamHI RNA; lane 3, pFS/AfII RNA; lane 4, pFS/Bs?Ell RNA. 


oligonucleotide was purchased (MWG-Biotech, Ebers- 
berg, Germany). The oligonucleotides used for CDNA 
synthesis, PCR, and 5’ RACE were 

1. 5-CAT CTA CAA CAG ATG AGG-3' 

2. 5-BIOTIN GCC TAT GAA AGT GCT GTT GTT 
AAT GG-3' 

3. 5'-TTA GAT TTA AGA ACA GCC TGT GAC GC-3’ 

4. 5-GTA GAC ACA AAG TCT AAA AAG C-3' 

5. 5'-GCC TAT GAA AGT GCT GTT GTT AAT GGT ,- 
3 

6. 5-GCC TAT GAA AGT GCT GTT GTT AAT 
GGA,,°3’. 


Construction of plasmids pFS and pAFS 


A 1264 base pair Ndel-Hpal fragment of clone 
T16D8 (corresponding to bases 12,293-13,557 in the 
HCV genome, see Fig. 1), or a 427 base pair Ndel- 
Sau96l fragment of clone T16D8 (corresponding to 
bases 12,293-12,720) was treated with the Klenow 
fragment of DNA polymerase and exchanged with the 
small (230 base pair) EcoRV fragment of pSP65-GUS 
(Priifer et a/., 1992). The clones containing the HCV 
DNA fragments in the correct orientation (pFS and 
pAFS, respectively) were identified by restriction en- 
zyme analysis and the constructions were verified by 
sequencing. 


/n vitro transcription and translation 


Plasmid DNA was linearized with A//ll or BstEll (pFS) 
or BamHI (pAFS) and transcribed with SP6 RNA poly- 
merase as described by Melton et a/. (1984). The in 
vitro synthesized, capped RNAs were translated in a 
rabbit reticulocyte lysate in the presence of [®®S]- 
methionine and the products were analyzed on 10% 
polyacrylamide—SDS gels as described previously (Sid- 
dell, 1983). The radioactivity incorporated into the 
translation products was determined using a Phos- 
phorlmager Model 400E (Molecular Dynamics, Sunny- 
vale, USA). 


RESULTS 


Molecular cloning and nucleotide sequence of the 
HCV 229E gene 1 


The HCV 229E gene 1 was cloned in a series of 21 
cDNA clones. One region, corresponding to bases 
5781-5934 in the genomic RNA, was not represented 
in the three cDNA libraries screened and it was se- 
quenced by PCR techniques. More than 85% of the 
sequence presented was determined on two or more 
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cDNA clones. The sequence encompassing the 
“frameshift region’ (see below) was obtained on five 
independent cDNA clones from two cDNA libraries 
(Fig. 1). The length of the consensus cDNA sequence 
ig 20,774 nucleotides, extending from base 1 at the 5’ 
end of the genome to base 20,774, which corresponds 
to the 68th codon of the HCV S protein gene. Within 
this sequence are two large ORFs. ORF 1a is initiated 
with an AUG at base 293 and contains 4086 codons. 
ORF 1b, which is initiated at base 12,508 with CAG 
contains 2687 codons and overlaps ORFla by 43 
bases in the (~1) reading frame. The nucleotide se- 
quence of the HCV 229E gene 1 has been deposited 
with the EMBL/GenBank/DDB) nucleotide sequence 
Data Libraries and is available under accession num- 
ber X69721. 


5’ Region of the genome 


The consensus sequence of cDNAs which encom- 
pass the region of the HCV 229E genome preceding 
the ORF 1a initiation codon was deduced from the se- 
quence of cDNA clones J12E6 and T35D5 together 
with 5’ RACE clones produced from poly(A)-containing 
RNA (Fig. 2). The validity of the HCV 229E sequence, 
therefore, depends upon the assumption that the 
mRNA leader sequence is equivalent to the 5’ end of 
the genomic RNA. This equivalence has been demon- 
strated for MHV (Shieh et a/., 1987). 

The genomic sequence begins with an adenine. At 
position 62-68 the sequence UCUCAAC is found. This 
or a closely related sequence is located adjacent to the 
5' end of all HCV 229E genes and represents the so- 
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Fic, 6. Dot matrix comparisons of the predicted amino acid se- 
quences of the ORF 1b proteins of HCV 229E, IBV, and MHV. Com- 
parisons of the HCV and IBV proteins (upper panel) and the HCV and 
MHV proteins (lower panel) were generated using the GCG program 
COMPARE (window, 100; stringency, 30; default comparison table) 
and displayed with the program DOTPLOT. 


called ‘‘intergenic consensus’’ sequence which is be- 
lieved to have a pivotal role in the discontinuous tran- 
scription of coronavirus mRNAs (Joo and Makino, 
1992). At position 86 a short ORF of 12 codons is initi- 
ated with AUG. This ORF would be unremarkable ex- 
cept that similar ORFs are conserved in the genomes 
of IBV and MHV (Boursnell et a/., 1987; Soe et a/., 
1987). It might be speculated that this ORF has a role, 
for example, in the regulation of the initiation of protein 
synthesis from genomic RNA. 


ORFia 


Structural features. ORF 1a has the potential to en- 
code a protein of 4085 amino acids with a predicted 


molecular weight of 454,200. The hydrophilicity profile 
of the predicted protein (data not shown) shows sev- 
eral regions in which hydrophobic residues predomi- 
nate. Particularly striking are the regions encompassed 
by amino acids 2720-2890 and 3270-3510. These 
regions represent potential membrane spanning do- 
mains. 

A comparison of the predicted HCV ORF 1a protein 
with the corresponding proteins of IBV and MHV using 
the GCG program GAP (default settings) indicates 
51.2% similarity (27.3% identity} between the HCV and 
IBV proteins and 51.4% similarity (28.0% identity) for 
the HCV and MHV proteins after optimal alignment. A 
more detailed analysis using the COMPARE program 
(window, 100; stringency, 30; default comparison ta- 
ble) illustrates that in all three proteins the regions of 
greatest similarity are located in the carboxy-terminal 
half of the molecule (Fig. 3). 

Putative functional domains. The predicted ORF1a 
proteins of IBV and MHV have been analyzed in detail 
and motifs which are thought to represent domains 
with specific functions have been identified (Gorba- 
lenya et a/., 1989; Lee et a/., 1991; Bredenbeek et a/., 
1990). This analysis can be extended by comparison of 
the predicted HCV 229E ORF 1a protein with those of 
IBV and MHV and the results of this analysis are shown 
in Fig. 4. 

The first domains which can be recognized in the 
HCV protein, display motifs indicative of papain-like 
proteases. In HCV 229E, as in MHV, two such motifs 
are found, located between amino acids 1041-1234 
and 1688-1886 (Fig. 4A). The most characteristic fea- 
ture of these motifs is the conserved putative catalytic 
Cys and His residues located at positions 1054 and 
1701 and 1305 and 1663, respectively, in the HCV 
protein. 

The second motif identified in the predicted HCV 
protein is related to the picornavirus 3C-like protease 
domain. This motif is located between amino acids 
2965-3265 (Fig. 4B). It should be noted that the fea- 
tures which distinguish the coronavirus 3C-like motif 
from other 3C-like protease motifs (a Gly — Tyr substi- 
tution in the vicinity of the proposed catalytic Cys resi- 
due and the absence of a conserved Asp/Glu as a third 
catalytic site residue) are maintained in the predicted 
HCV protein. 

The third HCV ORF 1a motif which has been identi- 
fied is a cysteine-rich domain located between amino 
acids 3933-4069 (Fig. 4C). This motif has been recog- 
nized in the MHV and IBV genomes and is related to 
motifs found in growth factors and their receptors. 


The frame-shifting region 
By analogy to IBV and MHV it seems likely that ex- 
pression of the HCV ORF tb is mediated by a (— 1) ribo- 


somal frame-shifting event during translation of the ge- 
nomic RNA in the region of the ORF 1a/ORF 1b overlap. 
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Fic. 7. Putative functional domains of the HCV 229E ORF 1b translation product. The amino acid sequences of ORF 1b of HCV 229E, MHV, and 
IBV were aligned using the UWGCG program PileUp (defauit settings) and the pam250.cmp comparison table. (A) The RNA polymerase domain, 
(B) the metal binding domain, {C) the helicase domain. In (A) the characteristic GDD (SDD) motif is highlighted and the polymerase domains | to 
VIII (Koonin, 1991} are shaded. In figure (B) the conserved Cys/His residues which may be involved in metal ion ligation (Lee et af, 1991) are 
shown in bold type and marked with an asterisk. In (C) the characteristic A" and "B" sites {Gorbalenya and Koonin, 1989) are shown and 
conserved residues are similarly highlighted. The numbering of the aligned sequence is for reference only. 


The consensus sequence of the cDNA clones which 
encompass this region is shown in Fig. 5A. The se- 
quence UUUAAAC which is found at position 12,514- 
12,520 in the HOV sequence, 27 bases upstream of 
the ORF 1a termination codon, is identical to the slip- 
page site of IBV (Brierley et a/., 1992) and the putative 
slippage site of MHV-JHM (Lee et a/., 1991) and MHV- 
ASS (Bredenbeek et a/., 1990}. The HCV sequences 3’ 


of this site can also folded into a tertiary RNA structure, 
the pseudoknot, which is the second element required 
for efficient frame shitting (Brierley et a/., 1989). Figure 
5B illustrates a pseudoknot structure in which the stem 
$1 is formed by base pairing of nucleotides 12,528-- 
12,537 and 12,546~12,555 and the stem S2 is formed 
by base pairing of nucleotides 12,541-12,545 and 
12,723-12,727 (overlined in Fig. 5A). This structure 
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Fic. 8. Position of the putative functional domains in the gene 1 translation products of HCV 229E, MHV, and IBV. The figure is drawn to scale 
although the boundaries of the motifs cannot be defined precisely. PLP, papain-like protease; 3CL, 3C-like protease; GFL, growth factor/recep- 
tor-like; POL, polymerase module; MBD, metal binding domain; HEL, helicase (NTP-binding) domain. The arrow indicates the position of the 


ORF 1a/ORF 1b junction. 


would necessitate an L1 loop of 3 bases and an L2 
loop of 168 bases, values that exceed the minimum 
required length (Brierley et a/., 1991). Clearly, further 
experimental evidence will be needed to confirm or re- 
fute this model. 

To confirm that the HCV 229E ORF1a/ORF 1b over- 
lap region is able to mediate (—1) ribosomal frame- 
shifting we constructed two plasmids for the in vitro 
transcription of mRNA (Fig. 5C). Plasmid pFS contains 
the putative frame-shifting region (nucleotides 
12,293-13,557) flanked by and in frame with DNA en- 
coding the amino- and carboxy-terminal regions of the 
E. coli 8-glucuronidase (GUS) protein. Plasmid pAFS 
was identical, except that the HCV 229E sequences 
extended only from position 12,293 to 12,720, i.e., did 
not include the pentanuclectide sequence CGAGC 
which is complementary to the sequence GCUCG 
which we propose to be in the S2 stem of the pseudo- 
knot structure. The plasmids were linearized with Afll 
or BstEll (PFS) or BamHI (pAFS) and capped SP6 run-off 
transcripts were synthesized in vitro. The transcripts 
were translated in rabbit reticulocyte lysate and the 
results are shown in Fig. 5D. 

The pFS/Afill transcript directed the synthesis of 
34,000 and 49,000 molecular weight proteins. The 
pFS/BstEll transcript directed the synthesis of 34,000 
and 66,000 molecular weight proteins and the pAFS/ 
BamHI transcript directed the synthesis of a 34,000 
molecular weight protein. By reference to Fig. 5C, it 
can be seen that these are the results expected if the 
HCV sequence of pFS mediates (— 1) ribosomal frame- 
shifting and the proposed pentanucleotide base-pair- 
ing interaction is necessary to produce a functional 
frame-shifting element. A quantitative Phospharlmager 
analysis of the data shown in Fig. 5D indicates that in 
the pFS transcripts frame-shifting occurs at a fre- 
quency between 18 and 30%. 

Careful analysis of the translation products directed 
by the pAFS/BamHi transcript reveals a protein of 


73,000 molecular weight, which would be expected if 
(—1) frame-shifting has occurred. The amount of pro- 
tein synthesized represents a frame-shifting frequency 
of < 1%. We believe this could be explained by a less 
stable S2 stem formed between the GCUCG pentanu- 
cleotide at position 12,541-12,545 and the sequence 
CGUGC located at nucleotides 12,586-12,590. Fur- 
ther studies are required to confirm this interpretation. 
We have also noted that in all translations the N-GUS- 
HCV ORF 1a product, predicted to have a molecular 
weight of 30.700, has a slower electrophoretic mobility 
than expected. At the moment we have no explanation 
for this anomaly. 


ORF1b 


Structural features. ORF1b has the potential to en- 
code a protein of 2686 amino acids with a molecular 
weight of 300,300. If, however, (—1) ribosomal frame- 
shifting takes place at the slippage site in ORF 1a (see 
above), the ORF1a/ORF 1b fusion protein has a poten- 
tial molecular weight of 754,200. The hydrophilicity 
profile of the predicted ORFib translation product 
(data not shown) shows both hydrophilic and hydro- 
phobic regions but none are indicative of extensive 
membrane spanning regions. A comparison of the pre- 
dicted HCV ORF 1b protein with the corresponding pro- 
teins of IBV and MHV using the GAP program (default 
settings) indicates 69.7% similarity (63.8% identity) for 
the HCV and IBV proteins and 70.5% similarity (54.2% 
identity) for the HCV and MHV proteins after optimal 
alignment. This high degree of similarity is essentially 
uniform over the entire length of all three proteins, as is 
evident in the dot matrix comparisons shown in Fig. 6 
(program COMPARE, window, 100; stringency, 30; 
default comparison table). 

Putative functional domains. As with ORF 1a, the 
HCV ORF 1b gene product can be compared with the 
ORF 1b proteins of MHV and IBV and putative func- 
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tional motifs can be identified. The first such motif is 
the RNA polymerase element located between amino 
acids 534 and 836 (Fig. 7A). The HCV motif aligns well 
with the MHV and IBV motifs and can be divided into 
eight distinct regions recognized by Koonin (1991) as 
characteristic of a wide variety of putative RNA poly- 
merases. The alteration of the RNA polymerase ‘‘core”’ 
sequence Glu-Asp-Asp to Ser-Asp-Asp is maintained 
in the HCV ORF 1b protein. 

The second motif recognized in the HCV protein is 
related to the ‘‘finger’’ domain characteristic of numer- 
ous DNA and RNA binding proteins. This motif located 
between amino acids 924-999 in the HCV protein, 
consists of a defined sequence of Cys and His resi- 
dues. As for the homologous region of the MHV pro- 
tein, not all of the residues which were originally pro- 
posed to be involved in the IBV ORF 1b metal binding 
domain are conserved in the HCV sequence (Fig. 7B) 
(Gorbalenya et a/., 1989). 

The third motif identified in the predicted HCV pro- 
tein is the purine NTP binding sequence pattern which 
is thought to be a feature of duplex unwinding (i.e., 
helicase) activities (Gorbalenya and Koonin, 1989). 
This motif is located in the HCV ORF 1b protein at posi- 
tion 1202-1330 and is highly conserved in comparison 
to the same motif in the MHV and IBV proteins 
(Fig. 7C). 

In addition to the sequence similarities in the RNA 
polymerase genes of HCV, IBV and MHV, recent analy- 
sis of arterivirus and torovirus RNA polymerase genes 
(Snijder et a/., 1990; Kuo et a/., 1991; Den Boon et a/., 
1991) have revealed evolutionary links between arteri-, 
toro-, and coronaviruses. The polymerase core motif, 
the finger domain and the NTP binding sequence pat- 
tern described above are found, for example, in the 
polymerase genes of equine arteritis virus and Berne 
virus. Also, a conserved domain located at the car- 
boxy-terminus of coronavirus, arterivirus and torovirus 
ORF 1b proteins has been recognized, but a function 
has not yet been proposed (Snijder et a/., 1990). 


DISCUSSION 


Coronaviruses have been traditionally divided into 
our antigenic groups (Holmes, 1990). HCV 229E be- 
longs to group 1, together with transmissible gastroen- 
teritis virus (TGEV), canine coronavirus (CCV), feline in- 
‘ectious peritonitis virus (FIPV) and feline enteric coro- 
navirus (FECV) (see, however, Sanchez et a/., 1990). 
Thus the nucleotide sequence of a group 1 
HCV229E), a group 2 (MHV), and a group 3 (IBV) cor- 
onavirus is now available. HCV 229E is also the first 
human coronavirus to be entirely sequenced and we 
hope that many questions concerning the biology and 
pathogenesis of these viruses can now be investigated 
more easily. 

The HCV 229E gene 1 is comparable in size and 
organization to gene 1 of IBV and MHV. The predicted 


gene product displays a number of structural features 
and putative functional domains (Fig. 8). These include 
functions related to RNA synthesis (POL, MBD, and 
HEL) in the ORF 1b gene product and proteolytic activi- 
ties (PLP and 3CL) in the ORF1a gene product. The 
experiments of ourselves and others (Brierley et a/., 
1987; Lee et a/., 1991; Bredenbeek ef a/., 1990) show 
that expression of these functions can be regulated via 
the mechanism of ribosomal frame-shifting. At the 
same time, we predict that in vivo they are also likely to 
be coordinated by the activation or inactivation of one 
set of functions (RNA synthesis) by the other (pro- 
teases). Clearly, it will be a difficult task to unravel 
these complex interactions. However, the availability 
of a complete set of cDNAs encompassing the HCV 
polymerase gene serves as a useful starting point. 

First, the cDNAs can be used to generate a collec- 
tion of immunological reagents which facilitate the 
analysis of polymerase gene expression in HCV-in- 
fected cells. Without such reagents it will be very diffi- 
cult to identify and characterize the low amounts of 
gene 1 products which can be expected. In this re- 
spect, an important step forward has also been the 
recent identification of the cellular receptor for HCV 
229E as aminopeptidase N (Yeager et a/., 1992). This 
finding may allow the development of better cell culture 
systems for the biochemical analysis of HCV replica- 
tion. 

Second, the HCV polymerase cDNAs together with 
recently developed vaccinia virus vectors (Merchlinsky 
and Moss, 1992) should make it possible to (over) ex- 
press the HCV 229E polymerase gene in eucaryotic 
cells. This will also facilitate the analysis of polymerase 
gene expression and more importantly provide an op- 
portunity to investigate the function of polymerase 
gene products via reverse genetics. Experiments to- 
ward these goals are in progress. 
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